rank | frequency | n-gram |
---|---|---|
1 | 149546 | -의 |
2 | 140963 | -는 |
3 | 88542 | -에 |
4 | 87015 | -을 |
5 | 80177 | -로 |
rank | frequency | n-gram |
---|---|---|
1 | 39004 | -에서 |
2 | 35092 | -으로 |
3 | 19339 | -라는 |
4 | 16906 | -이다 |
5 | 14938 | -에게 |
rank | frequency | n-gram |
---|---|---|
1 | 8958 | -에서는 |
2 | 8014 | -이라는 |
3 | 6490 | -로부터 |
4 | 5749 | -이라고 |
5 | 3736 | -이었다 |
rank | frequency | n-gram |
---|---|---|
1 | 2546 | -으로부터 |
2 | 1494 | -하였으며 |
3 | 1290 | -'이라는 |
4 | 1199 | -함으로써 |
5 | 1140 | -하였으나 |
rank | frequency | n-gram |
---|---|---|
1 | 729 | -ation |
2 | 267 | -으로부터의 |
3 | 175 | -하였으므로 |
4 | 168 | -라기보다는 |
5 | 161 | -시킴으로써 |
The tables show the most frequent letter-N-grams at the ending of words for N=1…5. Everything runs in parallel to 2.2.5 Most frequent word beginnings. The aim is suffix detection instead of affix detection.
For N=3:
SELECT @pos:=(@pos+1), xx.* from (SELECT @pos:=0) r, (select count(*) as cnt ,concat("-", right(word,3)) FROM words WHERE w_id>100 group by right(word,3) order by cnt desc) xx limit 5;
2.2.5 Most frequent word beginnings